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Amendments to the Claims: 

This listing of claims replaces all prior versions and listings of claims in the application: 
Listing of Claims : 



What is claimed is: 

1 . (Currently Amended) A method for training a Chinese language model from 
Chinese character inputs, comprising: 

e xtracting unknown charact e r strings from a set of Chines e inputs; 
d e t e rmining valid words from the unknown charact e r strings by comparing 
fr e quenci e s of occurr e nces of th e unknown character strings with frequenci e s of occurr e nc e s of 
individual charact e rs of the unknown character string; 

segmenting the Chinese characters into valid words and unknown character 
strings, wherein the valid words are entries in a Chinese dictionary and the unknown character 
strings are not entries in the Chinese dictionary, and wherein the unknown character strings 
comprise Chinese characters: 

for each unknown character string. 

determining a corresponding first frequency of occurrence for the 
unknown character string and a corresponding second frequency of occurrence for 
each of the Chinese characters in the unknown character string; 

comparing the first frequency of occurrence to the second frequency of 
occurrence to determine an information gain value: 

comparing the information gain value to a threshold: 
identifying the character string as a new valid word when the information 
gain is greater than the threshold: 
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adding the new valid word to the Chinese dictionary to create an updated Chinese 

dictionary; 

resegmenting the Chinese characters into Chinese words, wherein the Chinese 
words are entries in the updated Chinese dictionary : and 

generating a transition matrix of conditional probabilities for predicting a word 
given a context based on the resegmenting . 

2. (Canceled) 

3 . (Currently Amended) The method of claim 1 , wherein the transition matrix of 
conditional probabilities is generated based on n-gram counts generated from the Chinese 
character inputs where n >1. 

4. (Currently Amended) The method of claim 3, wherein the n-gram counts include 
the counts of n-tuples of adjacent and non-adjacent words in the set of Chinese character inputs. 

5. (Previously Presented) The method of claim 3, wherein the n-gram counts 
include the number of occurrences of each n-word sequence. 

6. (Canceled) 

7. (Currently Amended) The method of claim 1 , wherein the set of Chinese 
character inputs includes at least one of user Chinese inputs and a set of Chinese input 
documents. 

8. (Currently Amended) The method of claim 7, wherein the set of Chinese 
character inputs includes a set of user Chinese character queries to a web search engine. 
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9. (Currently Amended) A computer program product for use in conjunction with a 
computer system, the computer program product comprising a computer readable storage 
medium on which are stored instructions executable on a computer processor, the instructions 
including: 

extracting unknown charact e r strings from a s e t of Chines e inputs; 
d e t e rmining valid words from th e unknown charact e r strings by comparing 
fr e qu e ncies of occurr e nc e of the unknown character strings with fr e qu e nci e s of occurrence of 
individual characters of tho unknown character string; 

segmenting the Chinese characters into valid words and unknown character 
strings, wherein the valid words are entries in a Chinese dictionary and the unknown character 
strings ar e not entries in the Chinese dictionary, and wherein the unknown character strings 
comprise Chinese characters: 

for each unknown character string, 

determining a corresponding first frequency of occurrence for the 
unknown character string and a corresponding second frequency of occurrence for 
each of the Chinese characters in the unknown character string; 

comparing the first frequency of occurrence to the second frequency of 
occurrence to determine an information gain value: 

comparing the information gain value to a threshold: 
identifying the character string as a new valid word when the information 
gain is greater than the threshold: 

adding the new valid word to the Chinese dictionary to create an updated Chinese 

dictionary; 

resegmenting the Chinese characters into Chinese words, wherein the Chinese 
words are entries in the updated Chinese dictionary ; and 

generating a transition matrix of conditional probabilities for predicting a word 
given a context based on the resegmenting . 
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1 0. (Currently Amended) A system for training a Chinese language model, comprising: 

a segmenter configured to s e gment unknown charact e r strings from a sot of 
Chin e s e inputs segment the Chinese characters into valid words and unknown character strings 
and reseg ment the Chinese characters into Chinese words, wherein the valid words are entries in 
a Chinese dictionary, t he unknown character strings are not entries in the Chinese dictionary, and 
the Chinese words are entries in an updated Chinese dictionary, and wherein the unknown 
character strings comprise Chinese characters: 

a new word analyzer configured to d e t e rmine valid words from tho unknown 
character strings by comparing frequencies of occurrence of the unknown character strings with 
frequencies of occurr e nc e of individual charact e rs of th e unknown character string determine a 
corresponding first frequency of occurrence for the unknown character string and a 
corresponding second frequency of occurrence for each of the Chinese characters in the 
unknown character string, compare the first frequency of occurrence to the second frequency of 
occurrenc e to determine if the character string is a new valid word based on a threshold, and add 
the new val id word to th e Chinese dictionary to create the updated Chinese dictionary : and 

a Chinese language model training module configured to generate a transition 
matrix of conditional probabilities for predicting a word string given a context based on the 
resegmenting . 

11. (Canceled) 

12. (Currently Amended) The system of claim 10, wherein the new word analyzer is 
further configured to generate n-gram counts from the Chinese character inputs where n >1 and 
to generate the transition matrix of conditional probabilities based on the n-gram counts. 



1 3 . (Currently Amended) The system of claim 1 2, wherein the n-gram counts include 
the counts of n-tuples of adjacent and non-adjacent words in the set of Chinese character inputs. 
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1 4. (Previously Presented) The system of claim 1 2, wherein the n-gram counts include 
the number of occurrences of each n-word sequence. 

15. (Canceled) 

1 6. (Currently Amended) The system of claim 1 0, wherein the set of Chinese character 
inputs includes at least one of user Chinese inputs and a set of Chinese documents. 

1 7. (Currently Amended) The system of claim 1 6, wherein the set of Chinese character 
inputs includes a set of user Chinese character queries to a web search engine. 

1 8. (Withdrawn) A method for translating a pinyin input to at least one Chinese 
character string, comprising: 

generating a set of character strings from the pinyin input, each character string having a 
weight associated therewith indicating a likelihood that the character string corresponds to the 
pinyin input, the generating includes utilizing a Chinese dictionary including words extracted 
from a set of Chinese inputs and a language model trained based on the set of Chinese inputs. 

1 9. (Withdrawn) The method of claim 1 8, wherein the set of Chinese inputs includes 
at least one of user Chinese inputs and a set of Chinese documents. 

20. (Withdrawn) The method of claim 1 9, wherein the set of Chinese inputs includes 
a set of user Chinese queries to a web search engine. 

2 1 . (Withdrawn) The method of claim 1 8, further comprising: 

prior to the generating, filtering out non-alphabetic characters from the pinyin input and 
storing their respective positions within the pinyin input; and 
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after the generating, merging each of the character strings with the non-alphabetic 
characters in positions corresponding to their stored positions. 

22. (Withdrawn) The method of claim 1 8, further comprising: 

prior to the generating, identifying an ambiguous word in the pinyin input, the ambiguous 
word being selected from a database of n-grams that are valid both in non-pinyin and in pinyin; 
and 

analyzing context words of the user input to selectively classify the pinyin input as non- 
pinyin and pinyin, wherein the generating is performed only if the pinyin input is classified as 
pinyin. 

23 . (Withdrawn) The method of claim 1 8, further comprising generating a plurality of 
pinyin candidates from the pinyin input, wherein the generating includes generating a set of 
character strings for each pinyin candidate. 

24. (Withdrawn) The method of claim 1 8, further comprising sorting and ranking the 
set of character strings according to the likelihood that the pinyin input corresponds to the 
character string. 

25. (Withdrawn) The method of claim 18, wherein the generating includes 
performing a Viterbi algorithm utilizing the Chinese dictionary including words extracted from 
the set of Chinese inputs and the language model based on the set of Chinese inputs. 

26. (Withdrawn) The method of claim 1 8, further comprising: 

performing a search for a character string selected by a user from the set of character strings. 

27. (Withdrawn) The method of claim 1 8, wherein the search is a web search 
performed by a search engine. 
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28. (Withdrawn) The method of claim 1 8, further comprising: 
extracting unknown character strings from the set of Chinese inputs; 

determining valid words from the unknown character strings by comparing frequencies of 
occurrence of the unknown character strings with frequencies of occurrence of individual 
characters of the unknown character string to generate the Chinese dictionary, the dictionary 
includes a mapping of the words to their corresponding pinyin; and 

generating the language model for predicting a word string given a context. 

29. (Withdrawn) A computer program product for use in conjunction with a computer 
system, the computer program product comprising a computer readable storage medium on 
which are stored instructions executable on a computer processor, the instructions including: 

generating a set of character strings from the pinyin input, each character string having a 
weight associated therewith indicating the likelihood that the character string corresponds to the 
pinyin input, the generating includes utilizing a Chinese dictionary including words extracted 
from a set of Chinese inputs and a language model trained based on the set of Chinese inputs. 

30. (Withdrawn) A system for translating a pinyin input to at least one Chinese 
character string, comprising: 

a pinyin- word decoder configured to generate a set of character strings from the pinyin 
input, each character string having a weight associated therewith indicating the likelihood that 
the character string corresponds to the pinyin input, the pinyin-word decoder being further 
configured to utilize a Chinese dictionary that includes words extracted from a set of Chinese 
inputs and a language model trained based on the set of Chinese inputs. 

3 1 . (Withdrawn) The system of claim 30, wherein the set of Chinese inputs includes 
at least one of user Chinese inputs and a set of Chinese documents. 
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32. (Withdrawn) The system of claim 30, further comprising a pinyin candidate 
generator configured to generate a plurality of pinyin candidates from the pinyin input, wherein 
the pinyin-word decoder is configured to generate a set of character strings for each pinyin 
candidate. 

33. (Withdrawn) The system of claim 30, further comprising a sorting and ranking 
module configured to sort and rank the set of word strings according to the likelihood that the 
pinyin input corresponds to the character string. 

34. (Withdrawn) The system of claim 30, wherein the pinyin-word decoder is further 
configured to execute a Viterbi algorithm utilizing the Chinese dictionary including words 
extracted from the set of Chinese inputs and the language model based on the set of Chinese 
inputs. 

35. (Withdrawn) The system of claim 30, further comprising: 

a segmenter configured to segment unknown character strings from the set of Chinese 

inputs; 

a new word analyzer configured to determine valid words from the unknown character 
strings by comparing frequencies of occurrence of the unknown character strings with 
frequencies of occurrence of individual characters of the unknown character string; and 

a Chinese language model training module configured to generate a transition matrix of 
conditional probabilities for predicting a word string given a context. 

36. (Withdrawn) An pinyin classifier for classifying a user input, comprising: 
a database of words that are valid both in non-pinyin and in pinyin; and 
a classification engine configured to identify an ambiguous word in the user input 

selected from the database of words and to analyze context words of the user input to selectively 

classify the user input as non-pinyin or as pinyin. 
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37. (Withdrawn) The pinyin classifier of claim 36, wherein the classification engine is 
further configured to compute likelihoods of possible Chinese queries that may be generated 
from ambiguous query and to classify the user input as pinyin input if at least one of the 
likelihoods computed is above a predetermined threshold. 

38. (Withdrawn) The pinyin classifier of claim 37, wherein the classification engine is 
configured compute the likelihoods of possible Chinese queries if the user input is unresolved 
after the classification engine analyzes the context words. 

39. (Withdrawn) The pinyin classifier of claim 36, wherein the database of words that 
are valid both in non-pinyin and in pinyin is extracted from commonly occurring words in non- 
pinyin user queries. 

40. (Withdrawn) A method for pinyin classification of a user input, comprising: 
identifying an ambiguous word in the user input, the ambiguous word being selected 

from a database of n-grams that are valid both in non-pinyin and in pinyin; and 

analyzing context words of the user input to selectively classify the user input as non- 
pinyin or as pinyin. 

41 . (Withdrawn) The pinyin classification method of claim 40, further comprising: 
computing likelihoods of possible Chinese queries that may be generated from 

ambiguous query; and 

classifying the user input as pinyin input if at least one of the likelihoods computed is 
above a predetermined threshold. 

42. (Withdrawn) The pinyin classification method of claim 41 , wherein the 
computing and classifying are performed if the user input is unresolved after the analyzing. 
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43. (Withdrawn) The pinyin classification method of claim 40, wherein the database 
of words that are valid both in non-pinyin and in pinyin is extracted from commonly occurring 
words in non-pinyin user queries. 

44. (Withdrawn) A method for presenting possible translations of a user input, 
comprising: 

providing a hyperlink for each possible translation of the user input, the user input and 
each possible translation of the user input being in different languages or language formats. 

45. (Withdrawn) The method for presenting possible translations of claim 44, wherein 
the user input is in pinyin and each of the possible translations is in Hanzi. 

46. (Withdrawn) The method for presenting possible translations of claim 44, further 
comprising: 

providing at least one other hyperlink corresponding to a spelling correction of the user 

input. 

47. (Withdrawn) The method for presenting possible translations of claim 44, wherein 
the hyperlink is to a web search of the corresponding possible translation of the user input. 



